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MARK-UP LANGUAGE FRA3VIEWORK WITH GENERIC HANDLING 
DATA SOURCES 



DESCRIPTION OF THE PRIOR ART 

5 Mark-up language is a set of codes in a text file that enable a computing device to format 
the text for correct display or printing. A dient (i.e. any process that requests a service 
firom another process) in a software system creates mark-up language using a 'generator*. 
It reads and interprets mark-up language vising a 'parser'. 

In the prior art, parsers and generators have been specific to certain kinds of mark-up 
10 languages. For example, a dient could use an XML (extensible mark-up language) parser 
to interpret and handle XML files; it could use a separate WBXML (WAP binary XML) 
parser to interpret and handle WBML files. In eadi case, the dient would talk dinctfy to 
each parser. 

When the dient needs to generate mark-up language format files, there could be an XML 
15 generator and a separate WBXML generator. Again, the dient would talk direct^ to each 
generator. 

In prior art systems, tbe kind of data source that was used by the parser or generator was 
fixed; parsers and generators were hard-coded to use a spedfic type of source - e.g. a 
btifiFer in memory. 
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SUMMARY OF THE PRESENT INVENTION 

The present invention is a portable computing device programmed with a mark-up 
language parser or generator that can access data firom a source using a generic data 
suppUer API. 



Hence, the parser or generator is insulated hom having to talk direcdy to a data source 
instead, it does so via a generic data supplier API, acting as an intermediary layer. This 
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de-couples the parser or generator from ihe data, source and hence means that the parser 
or generator no longer have to be hard coded for a specific data supplier. This in turn 
leads to a simplification of the parser and generator design. 

The present invention allows parsing and generation to be carried out with any data 
5 source. For example, a buffer in memory could be used, as could a file, as could 
streaming ficom a socket (hence enabling the ability to parse in real-time firom data 
streamed over the internet). There is no requirement to define, at parser/generator build 
time, what particular data source will be used. Instead, the system allows any source that 
can use the generic data supplier API to be adopted. New types of data sources can be 
10 utilised by computing device, even after those devices have been shipped to end-users. 
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DETAILED DESCRIPTION 
Overview of Key Features 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
5 SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

The Mark-up Language framework implements three key features. 

1 . Generic Parser API 

Oients are separated from mark-up language parsers/generators by an intermediary layer 
10 that (a) insulates the dient from having to communicate directly with the parser or 
generator and is (b) generic in that it presents a common API to the dient irrespective of 
the specific kind of parser or generator the intermediary layer interfaces witii. 

2. Data vaUdation/pre-filtering and altering components in a chain of 
responsibility 

15 Mark-up language parsers or generators can access components to vaHdate, pre-filter or 
alter data; the components are plug:in components that operate using a 'chain of 
responsibility* design pattern. 

3. Generic Data Supplier API 

The mark-up language parsers or generators can access data from a source using a 
20 generic data suppUer API. insulating the parser or generator from having to 
communicate direcdy with the data source. 

Each of this features will now be discussed in more detail. 

1. Generic Parser Intermediary Layer 

The essence of this approadi is that the dient that interfeces with a mark up language 
25 parser or a generator via an intermediary layer that (a) insulates the dient from having to 
communicate directly with the parser or generator and is (b) genetic in that it presents a 
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common API to the client irrespective of the specific kind of parser or generator the 
intermediary layer interfaces with. 

In this way, the client is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible with the intermediary layer, yet it 
5 remains far simpler than prior art clients that are hard-coded to operate direcdy with 
several different kinds of parsers and generators. 

The API is typically implemented as a header file. In an implementation, the 
intermediary layer acts as an extensible framework and the parsers and generators are 
themselves plug-ins to that firamework. The present invention may hence readily allow 
10 the device to operate w ith differen t kinds of parsers and generators; thh eytf-nsibility is 
impossible to achieve with prior art hard-coded systems. 

The specific kind of parser or generator being used is not known to the client: the 
intermediary layer fully insxilates the client firom needing to be aware of these specifics. 
Instead, the client deals only with the intermediary layer, which presents to the client as a 
15 generic parser or a generic generator - Le. a parser or generator which behaves in a way 
that is common to all parsers or generators. 

For example, the SyncML the protocol supports both XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to the firamework, a SyncML client 
can use either or both type of parser and generator without knowing about the type of 
mark-up language; as a result, the design of the SyncML client is greatly simplified. Since 
WBXML and XML are quite different in the way they represent their data, one very 
useful feature of the invention is the mapping of WBXML tokens to a string in a static 
string pool table. Appendix B expands on this idea. 

The present invention may provide a flexible and extensible file conversion system: for 
example, the device could parse a document written in one mark up language format and 
then use the parsed document data to generate an equivalent document in a different file 
format. Because of the extensible plug-in design of an implementation of the system, it 
is possible to provide far greater kinds of file conversion capabilities than was previously 
the case. New kinds of parsers and generators can be provided for loading onto a device 
after that device has been shipped to an end-user. The only requirement is that they are 
compatible with the intermediary layer. 
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Another advantage of the present invention is that it allows not only different parsers 
and generators to be readily used by the same cUent, but it allows also several different 
clients to share the same parsers and generators as well. The API may itself be 
extensible, so that extensions to its capabilities (e.g. to enable a new/ extended mark-up 
language of a document to be handled) can be made without affecting compatibility with 
existing dients or listing parsers and generators. Similarly, new kinds of clients can be 
provided for loading onto a device after that device has been shipped to an end-user. 
The only requirement is that they are compatible witia the intermediary layer. 

2. Data vaUdation/pre-filtermg and altering components in a chain of 
responsibility 

The essence of this approach is that the maric-up language parser or generator can access 
components to validate, pre-filter or alter data, in which the components are plug-in 
15 components that operate xising a chain of responsibility. 

Because of the plug-ia design of the components, the system is inherentiy flexible and 
cactensible compared with prior art systems in which a component (for validating, pre- 
filtering or altering data from a parser or generator) would be tied exclusively to a given 
parser. Hence, if a mark up language of a document is extended, or a new one created, it 
20 is possible to write any new validation/pre-filter/altering plug-in tiiat is needed to work 
with the extended or new language. These new kinds of vaUdation/pre-filter/altering 
plug-ins can be provided for loading onto a device even after tiiat device has been 
shipped to an end-user. The 'chain of responsibility' design pattern, whilst known in 
object oriented programming, has not previsouly been used in the present context 

25 The plug-in components may all present a common, generic API to the parser and 
generator. Hence, the same plug-in can be used witii different types of parsers and 
generators (e.g. a XML parser, a WBXML parser, a RTF parser etc.). The plug-ins also 
present a common, generic API to a cUent component using die parser or generator. 
Hence, the same plug-ins can be vised by different clients. 

30 For example a DTD validator plug-in could be written tiiat validates the mark-up of a 
document and can report errors to die dient Or for a web browser an auto correction 
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plug-in filter coiald be written tiiat tries to correct errors found in the mark-up language, 
such as a missing end element tag, or a incorrecdy placed element tag. The auto 
correction plug-in will, if it can, fix the error transparendy to die client. This enables a 
web browser to still display a document rather dien just displaying an error reporting that 
5 daere was an error in die document. 

Because, the plug-ins can be chained together, complex and different type of filtering and 
validation can take place. In the example above the parser could notify die validator plug- 
in of elements it is parsing and these in turn would go to die auto correction plug-in to 
10 be fixed if required and finally die client would receive diese events. 



The mark-up firamework allows parser plug-ins to expose the parsed element stack to all 
validation/pre-filter/altering plug-ins. (The parsed element stack is a stack populated 
with elements from a document extracted as diat document is parsed; this stack is made 
available to all validation/pre-filter/altering plug-ins to avoid die need to duplicate die 
stack for each of these plug-ins). This also enables die plug-ins to use the stack 
information to aid in validation and filtering. For example an auto corrector plug-in may 
need to know die entire element list tiiat is on die stack in order to figure out how to fix 
a problem. 

The use of filter /validator plug-ins in mark-up language generators is especially useful for 
developers writing a client to the framework and generating mark-up docinnents as die 
same validator plug-in used by the parser can be used in the generator. Errors are 
reported to the dient when the mark-up does not conform to the validator which will 
enable the developer to make sure they are writing weU formed mark-up diat conforms 
to the DTD and catch error eariy on during development. 

The mark-up framework incorporates a character conversion module that enables 
documents written in different character sets (e,g, ASCII, various Kanji character sets 
30 etc.) to be parsed and converted to UTF8. This means a client obtains the results firom 
the parser in a generic way (UTF8) widiout having to know the original character set diat 
was used in die document. Clients hence no longer need to be able to differentiate 
between different character sets and handle die different character sets appropriately. 
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3. Generic Data Supplier API 

The mark-up language parser or generator accesses data £irom a source using a generic 
data suppUer API. Hence, the parser or generator is insulated £com having to talk direcdy 
to a data source; instead, it does so via a generic data suppUeir API, acting as an 
intermediary layer. This de-couples tiae parser or generator &om die data source and 
hence means that the parser or generator no longer have to be hard coded for a specific 
data supplier. This in turn leads to a simplification of the parser and generator design. 

The present invention aUows parsing and generation to be carried out witii any data 
source. For example, a buffer in memory could be used, as could a file, as could 
streaming firom a socket (hence enabling tiie abiUty to parse in real-time firom data 
streamed over the internet). There is no requirement to define, at parser/generator build 
time, what particular data source vnll be used. Instead, tiae system allows any source tiiat 
can use the generic data supplier API to be adopted. New types of data sources can be 
utiUsed by computing device, even after those devices have been shipped to end-users. 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London. United Kingdom. 
SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 



Appendix 1 describes the Mark-Up Language Framework in more detail. Appendix 2 
describes a particular technique, referred to as 'String Pool', which is used in the Mark- 
Up Language Framework. The appendices refer to various SymbianOS specific 
programming techniques and structures. There is an extensive pubUshed Uterature 
25 describing these techniques; reference may for example be made to 'Trofessional 
Symbian Programming" Wrox Press Inc. ISBN: 186100303X. 
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Introduction 
Purpose and Scope 

This document describes the architecture for a generic mark-up firamework. The 
framework is extendable by using plug-ins so that mark-up parsers and generators (e.g. 
5 XML[1], WBXMLPI ) can be used. 

Design Overview 
Block Diagrams 

The mark-up foamework block diagram is shown in Error! Reference source not 
10 found.. The CUent is the appUcation using the mark-up firamework for parsing or 
generating a document. The Parser and Generator components are ECOM plug-ins 
specific to a mark-up language (e.g. XML or WBXML). These components use die 
Namespace coUection to retrieve information about a specific namespace during die 

parsing or generating phase. 

15 The Namespace Plug-in component is an ECOM plug-in ihat sets-up all the dements, 
attributes and attribute values for a namespace. For each namespace used there must be a 
plug-in that describes the namespace. The namespace information is stored in a string 
pool. The string pool is a way of storing strings that. makes comparison ahnost 
instantaneous at die expense of string creation. It is particulariy efficient at handling 

20 string constants diat are known at compile time, which makes it very suitable for 
processing documents. The Namespace owns the string pool tiiat the Parser, 
Generator and Client can gain access to. 

The Namespace Plug-in simply sets-up die string pool with die required strings for die 
namespace die plug-in represents. The Client may get access to die Namespace 
25 Collection via die Parser or Generator to pre-load namespaces prior to parsing or 
generating documents which may speed up die parsing or generating session. 

The Plug-in components (1 - 4) are optional and allow furdier processing of die data 
before die cUent receive it such as DTD vaUdators or document auto correctors. 
30 Validators check die elements and attributes conform to die DTD. Document auto 
correction plug-ins are used to try to correct errors reported firom DTD vaUdators. 
The parser is event driven and sends events to die various plug-ins and UI during 
parsing. Error! Reference source not found, shows a dient parsing witii a DTD 
vaHdator and auto corrector. The dient talks to die parser direcdy to start die parse. The 
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parser sends events to die chain of plug-ins. The first plug-in that receives events is the 
DTD validator plug-in. This plug-in validates that the data in the event it received is 
correct. If it is not correct it will send the same event the parser sent to the validator to 
the auto corrector except for a error code that will describe the problem the validator 
5 encountered. It the event data is valid the same event will be sent to the auto corrector. 
Now the auto corrector receives the event and can check for any errors. If there is an 
error it can attempt to correct it. If it can correct the error it will modify the data in the 
event and remove the error code before sending the event to the client. The client finally 
receives the event and can now handle it 
10 Error! Reference source not found, illustrates a client generating using a DTD 
validator and auto corrector plug-ins. A real client would proba bly n ever use a generator 
and auto corrector since the data the client generates should always be valid, but it is vised 
here tx> show die flow of events firom a generator and any plug-ins attached. 

15 The client sends a build request to die generator. The first diing die generator does is to 
send the request as an event to the DTD validator plug-in. The situation is similar to the 
parser, die DTD validator plug-in validates that the data in the event it received is 
correct If it is not correct it wiU send the same event the parser sent to die validator to 
the auto corrector except for a error code that will describe die problem die validator 

20 encountered. It the event data is valid die same event will be sent to die auto corrector. 
Now die auto corrector receives the event and can check for any errors. If diere is an 
error it can attempt to correct it If it can correct the error it will modify the data in die 
event and remove the error code before sending the event back to the generator. The 
major difference between the events during parsing and generating is with generating, 

25 once the final plug-in has dealt witii die event it gets sent back to die generator. The 
generator receives die event and builds up part of the document using the details firom 
die event 




30 Parsing and Generating WBXML 

Parsing WBXML is quite different to parsing XML or HTML. The main difference is 
elements and attributes are defined as tokens radier than using dieir text representation. 
This means a mapping needs to be stored between a WBXML token and its static string 
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representation. The Namespace plug-in for a particular namespace vnU store these 
mappings. A WBXML parser and generator can then obtain a string ftom the 
namespace plug-in ^en the WBXML token and vice versa. 

5 Class Diagram 

The class diagram foi: the mark-up framework is shown in Error! Reference source not 
found.. The diagram also depicts plug-ins that makes use of the framework. The green 
(or dark grey classes in bacw) are the plug-ins that provide implementation to the mark- 
up framework. CXmlPatser and CWbxmlParser provide an implementation to parse 
10 XML and WBXML documents respectively. In the same way CXmlGenerator and 
CWbxmlGenerator generate XML and WBXML documents respectively. CVaUdatot is 
a plug-in which will vaUdate the mark-up document during parsing or generating. 
CAutoCoxrector is a plug-in that corrects invalid mark-up documents. 
When parsing a document and the dient receives events for the start of an element for 
15 example (OnStartmementL), the element RString in the event is a handle to a string in 
the string pool. If this is a known string, i.e. one that has been added by the Namespace 
Plug-in then the string will be static Otherwise, if it is an unknown string, the parser wiU 
add the string to the string pool as a dynamic string and return a RString with a handle 
of this string. It is not possible to know if a RString is dynamic or static so the parser or 
20 generator that obtains a RString must be sure to dose it to ensure any memory is 
rdeased if the string is dynamic. A client that wishes to use the RString after the event 
returns to the parser must make a copy of it which will increase the reference count and 
make sure it is not deleted when the parser doses it. 

25 Error! Reference source not found, is an example class diagram that shows tiie major 
classes for parsing WBXML SyncML documents. The client creates a 
CDescriptorDataSuppUer that supplies the data to the parser. CWbxmlParser is the 
class tiiat actuaUy parses the document CSyncMLNamespace is tiie namespace for 
SyncML tiiat the parser uses to map WBXML tokens to strings. All die other classes 

30 bdong to die mark-up framework. To parse a document with different namespaces die 
only dxing that needs to be added is a plug-in for each namespace. 
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Object name 


Desct^tion 


Associated (owned/dependant) objects 


MMarkupCallback 


A call-back that a client must 
implement so that the parser can 
report events back to the client 
during the parsing session. 


Inherited by clients and plug-ins. 


KNamespaceCollection 


Contains a collection of namespaces. 
Contains reference counter so multiple 
parsers or generators may use the same 
namespace collection. 


Owned by either CParserSession or 
CGeneratorSession. Owns an array of 
CMarkupNamespace plug-ins. 


CMaxkupNamespace 


ECOM interface to implement a 
namespace. 


Inherited by any namespace plug-ins. 


.J^„imetSession 


-P^abUe- -int-erfece--for-a-tdttem 
create a parser session. 


Owned by the client 


RGenetatorSession 


Public interface for a client to 
create a generator session. 


Owned by the client. 


CMatkupCharSetConverter 


Helper function which uses 
CCnvCharacterSetConverter for the 
client, parser and generator to do any 
character set conversions or resolving 
MIB Enums or Internet-standard names 
of character sets. 


Owned by RParserSession and 
RGeneratorSession. 


CMatkupPluginBase 


Generic interface for any type of plug-in. 


Inherited by CM arkupPlugin, 
CParserSession and CGeneratorSession. 


CMaikupFlugin 


ECOM Interface for plug-ins to be 
used by the parser and generator. 


Owned by CParserSession or 
CGeneratorSession. 


MDataSupplierReader 


Pure virtual interface to be implemented 
by a data supplier for reading data. 


Inherited by the client's data 
provider. 


MDataSupplierWtiter 


Pure virtual interface to be implemented 
by a data supplier for writing data. 


Inherited by the client's data 
provider. 


CParsetSession 


ECOM interface for parser plug-ins. 


Inherited by a concrete parser 
implementation. 


v^vjeneraioraession 


ECOM interface for generator 

plug-ins. 


Inherited by a concrete generator 
implementation. 


RAttdbute 


Contains the name and value of an 
attnbute. 


Used by the parse, generator and 
client. 
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The classes bdow are not part of the ftamework but illusttate how the fiamework can be used. 


CValidator 


A DTD, schema or some other 
type of validator. 


Owned by JUrarseioession ut 
RGeneratotSession. 


CAutoCortector 


Used to auto correct invalid data. 


Owned by RParserSession or 


CXmlParser 


An XML parser implementation. 


Owned by RParserSession. 


CWbxmlPatser 


A WBXML parser implementation. 


Owned by RParserSession. 


CXmlGenetator 


An XML generator 
implementation. 


Owned by RGeneratotSession. 


CWbxmlGenetatot 


A WBXML generator 
implementation. 


Owned by RGeneratorSession. 


CNamespace 


A namespace plug-m to use witn a 
parser and generator. 


Owned d y 
KNamespaceCollection. 


KElementStack 


A stack of the currendy processed 
elements during parsing or 
generating. 


Owned by CParserSession and 
CGeneratorSession. 



Detailed Design 
RParserSession 



The following is the public API for this class: 



Method 


Description 1 


void OpenL( 

MDataSupplierReader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallback& aCallback) 


Opens a parser session, 

aReader is the data supplier reader to \ise during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aOallback is a reference to the call-back so the parser can 
report events. 


void OpenL( 


Opens a parser session. 
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MDataSupplierReader& aReader, 

const TDesC8& 

aMarkupMimeTj^e, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallback& aCallback, 

RMarkupPlugins aPlugins) 


aReader is die data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so the parser can 
report events. 

aPlugins is an array of plug-ins to use with the parser. The 
first plug-in in die list is die first plug-in to be called back 
from the parser. The first plug-in \vill dien call-back to the 

second plug-in etc. 




void OpenL( 

MDataSuppliei:Reader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesCBSc 

. aDocumentMimeType, 

MMarkupCallback& aCallback, 

RMarkupPlugins aPlugins Q, 

RNamespaceCollection 

aNamespaceCollection) 


Opens a parser session. 

aReader is die data supplier reader to use dxudng parsing. 
aMarkupMimeType is die MIME type of die parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so die parser can 
report events. 

aPlugins is an array of plug-ins to use with the parser. The 
first plug-in in the list is the first plug-in to be called back 
firom the parser. The first plug-in will dien call-back to the 
second plug-in etc. 

aNamespaceCollection is a handle to a previous 
namespace collection. This is usefiil if a generator or 
another parser session has been created so that same 

namespace collection can be shared. 


void CloseQ 


Qoses die parser session. 


void StartQ 


Start parsing the document 


void StopO 


Stop parsing die document. 


void Reset( 

MDataSupplierReader& aReadet, 
MMarkupCallback& aCallback) 


Resets the parser ready to parse a new document. 
aReader is die data supplier reader to use during parsing. 
aCallback is a reference to the call-back so the parser can 
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report events. 


Tint SetParseMode( 
Tint aParseMode) 


Selects one or more parse modes. 

aParseMode is one or more of the following: 

EConvertTagsToLowerCase - Converts elements and 
attributes to lowercase. This can be used for case- 
insensitive HTML so that a tag can be matched to a 
static string in the string pool. 

EErrotOnUnrecognisedTags - Reports an error 
when unrecognised tags are found. 
ERepottUtttecognisedTags - Reports unrecognised 

tags. 

EReportNamespaces - Reports the namespace. 
EReportNamespacePtefixes - Reports the namespace 
prefix. 

ESendFuUContentlnOneChtink - Sends all content 
data for an element in one chunk. 

EReportNameSpaceMapping - Reports namespace 
mappings via the DoStartPrefixMappingO & 
DoEndPrefixMappingO methods. 

If this function is not called the default will be: 

*-iT» - ^T\^^^r^^n^^eit^Al*ct€9-Q. 1 'ERe'DortNamespaces 

EReportUnrecogiiiseaJL ags j x:ijx.cj^*jxi«i^c*i^wwjf 

I£ the parsing mode is not supported KErtNotSupported is 
returned. 







RGenexatoxSession 



The folloMrfng is the pubHc API for this dass: 





Description 


void OpenL( 

MDataSupplierWriter& aWriter, 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
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TUid aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType) 


document. 

aMarkupMimeType is die MIME type of the generator to 
open, 

aDocumentMimeType is the MIME type of the 
document to parse. 


void OpenL( 

MDataSupplierWriter& aWiiter, 
TUid aMarkupMimeTj^e, 
const TDesC8& 
aDocumentMimeType, 
RMarkupPlugins aPluginsQ) 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME t^qpe. o£_±he_ 




dociunent to parse. 

aPlugins is an array of plug-ins to use with die generator. 


void OpenL( 

MDataSupplierWriter& aWiiter, 
TUid aMadcupMimeType, 
const TDesC8& 
aDocumentMimeType, 
RMarkupPlugins aPluginsQ, 
RNamespaceCollection 
aNamespaceCoUection) 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aPlugins is an array of plug-ins to use with die generator. 
aNamespaceCoUection is a handle to a previous 
namespace collection. This is useful if a generator or 
another parser session has been created so that same 
namespace collection can be shared. 


void CloseO 


Closes die generator session. 


void Reset( 

MDataSupplietWritef& aWiiter, 
MMarkupCallback& aCallback) 


Resets die generator ready to generate a new document. 
aWriter is the data supplier writer used to generate a 
document. 

aCallback is a reference to the call-back so die generator 
can report events. 


void BuildStartDocumentL( 
RDocumentPatameters 


Builds die start of die document. 

aDocParam specifies the various parameters of the 
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1 aDocParam); 1 


document. In the case of WBXML this would state the 1 
public ID and string table. | 


void BuildEndDocumentLQ 


Builds the end of the document. j 


void BuildStartElementL( j 
RTaglnfoSc aElement, 
RAttributeArraySc aAtttibutes) 


Builds the start element witn attriDutca ouu xAaj-Liwo^«.x-w xj. ■ 

specified. 1 
axLiIement is a xianoie to tnc cicmcuL o u-^^cajj-o. i 
aAtttibutes contains the attributes for the element. j 


1 void BuildEndElementlX 1 
RTaglnfoSc aElement) 


Builds the end of the element. 

aElement is a handle to the element's details. j 


void BuiidContentL( 

const TDesC8& aContentPatt) 


Builds part or all of the content. Large content should be j 

1 •! • i_ ^1 _ T « 4-t>k4ct -Ai^^^i^i^f^ cl^#^nlH np called manv 1 
built m chunks. I.e. ttus runcnon snovuu. uc k^au.^^ i^a^xy 

times for each chunk. 

aBytes is die raw content data. This data must be converted 
1 to the correct character set by the client. | 


void BuildPrefixMappingL( 
RString& aPf efix, 
RString& aUri) 


Builds a prefix - URI namespace for the next element to be 

1 « -t, rrn • ^1 1 0*nW»A. f»«ifVi riAtnesiDace tnat t 

1 built. This metnoa can oe caiiea ror ca.uxi iiAu.ito^;/4*w*- uxa.v ■ 

needs to be declared. 1 
aPrefix is the Namespace prefix being declared. 
1 aUri is the Namespace URI the prefix is mapped to. ' j 


void BuildProcessingInstructionL( 
RString& aTarget, 
RStting& aData) 


Build a processing instmction. 
aTarget is the processing instruction target. 
1 aData is the processing instmction data. j 



RTaglnfo 



Method 


Description 1 


void Open( 
RString& aUri, 
RString& aPrefibc, 
RString& aLocalName) 


Sets the tag information for an element or annouxc. 
aUri is the URI of the namespace. 
aPrefix is the prefix of the qualified name. 
aLocalName is the local name of the qualified name. 


void CloseQ 


Closes the tag information. 
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RString& PrefixO 


Returns the prefix. 


RNamespaceCoUection 

The following is the public API for this class: 


1 Method 


Description 


void ConnectQ 


Every time this method is called a reference counter is 
incremented so that the namespace collection is only 
destroyed when no clients are using it. 


void CloseQ 


Every time this method is called a reference counter is 
decremented and the object is destroyed only when the 
^referiEijQicre counter is 7ero. - - ^ . 







const CMarkupNameSpacefic 

OpenNamespaceL( 

const TDesC8& aMimeType) 


Opens a namespace plxig-in and returns a reference to the 
namespace plug-in. If the namespace plug-in is not loaded it 
will be automatically loaded. 

aMimeType is the MIME type of the plug-in to open. 


const CMarkupNameSpacefic 

OpenNaniespaceL( 

TUintS aCodePage) 


Opens a namespace plug-in and returns a reference to the 
namespace plug-in. 

aCodePage is the code page of the pliag-iti to open. 


void ResetQ 


Resets the namespace collection and string pooL 


RStringPool StringPoolQ 


Returns a handle to the string pool object. 


CMatkupNamespace 

The foUowing is the API for this class: 


Method 


Description j 


void NewL(RStringPool 
aStxuigPooI) 


Creates the namespace plug-in. 

aSttingPool is a handle of the string pool to add static 
string tables. 


RStiing& Element( 

TUintS aWbxmlToken) const 


Returns a handle to the string, 

aWbxmlToken is the WBXML token of the element. 


void AttributeValuePair( 
TUintS aWbxmlToken 
RString& aAttribute, 
RStringSc aValue) const 


Returns a handle to die attribute and value strings. 
aWbxmlToken is the WBXML token of the attribute, 
aAttribute is the handle to the attribute string. 
aValue is the handle to the value string. 
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RString& AttributeValue( I 
TUintS aWbxmlToken) const 


K.etums a nanole to an attnoute vaiuc. 
aWbxmlToken is the WBXML token of the attribute. 


RString& NamespaceUriQ const 


Returns the namespace name. 


TUintS CodePageQ const 


Returns the code page for this namespace. 


RTableCodePage 

The following is the API for this class: 


1 Method 


Description 1 


RString NameSpaceUriO 


Returns the namespace URI for tius code page. 


Tint StiingPoonndexFromToken( 
Tint aToken); 


Gets a StringPool mdex from a token value. -1 is returned if 
the item is not found. 


Tint TokenFromStringPoo]Index( 
Tint aindex); 


Gets a token value from a StringPool index. -1 is returned if 
the item is not found. 


CMarkupPluginBase 

The following is the API for this ECOM class: ^^^^^^^^^^^^^^^ 


1 Method 




(31a£kupPluginBase& RootPlug?nO 


Returns a reference to the root plug-in. This must be either 
a parser or generator pliog-in. 


CMarkupPlug^nBaseSc 
PatmtPlugjbciO 


Returns a reference to the Parent plug-in. 


KElementStack& ElementStackO 


Returns a handle to the element stack. 


RNameSpaceCoEection& 
NamespaceCollectionO 


Returns a handle to the namespace collection. 


CMarkupCharSetConverterSc 
CharSetConverterQ 


Returns a reference to the character set converter object. 


TBool IsChildElementValid( 
RStting& aParentElement, 
RString& aChildElement) 


Checks if the aChildElement is a valid child of 
aParentElement. 



CMadcupPlugin 

The follo\wng is the API for this ECOM class: 



Method 



Description 
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CMar kupPlugin* NewL( 
MMarkapCallback& aCaUback) 


Creates an instance of a mark-up plug-in. 

aCallback is a reference to the call-back to report events. 


void SetParent( 

CMarkupPluginBase* 

aParentPlugin) 


Sets the parent plug-in for this plug-in. 
aParentPlugin is a pointer to the parent plug-in or NULL 
if. there is no parent. A parser or generator does not have a 
parent so this must not be set, as the default NULL will 
indication there is not parent. 


CParserSession 

The following is tiie API for this ECOM class: 


Method 


Description 


CParserSession* NewL( 

MDamSupplierReader& aReadet, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCaIlback& aCallback, 

RNamespaceCollection* 

aNamespaceCollection, 

CMarkupCharSetConverter& 

aCharSetConverter) 


Opens a parser session. 

aReader is die data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of die parser to 
open. 

aDocumentMimcType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so the parser can 
report events. 

aNamespaceCollection is a handle to a previous 
namespace collection. Set to NULL if a new 
RNamespaceCollection is to be used. 

aCharSetConverter is a reference to the character set 
conversion class. 


void StartQ 


Start parsing the document. 


void StopO 


Stop parsing die document. 


void Reset( 

MDataSupplietReader& aReader, 
MMarkupCallback& aCallback) 


Resets the parser ready to parse a new document 
aReader is die data supplier reader to use during parsing. 
aCallback is a reference to die call-back so the parser can 
report events. 


void SetParseMode( 
Tint aParseMode) 


Selects one or more parse modes. 

See RParserSession for details on aParseMode. 
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CGenetatorSession 

The foUowing is the API for this ECOM dass: 



Method 


Description ^ H 


void OpenL( 

MDataSupplierWnter& aWritet, 
TUid aMarloipMimeType, 
const TDesC8& 
aDocumentMimeType, 
MMarkupCallbackSc aCallback, 
RNamespaceCoflection* 
al^amespaceCollection, 
CMarkupChaj:SetConverter& 
aChatSetConvertet) 


Opens a generator session. 

aWtiter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so the generator 
can report events. 1 
aNamespaceCoUection is a handle to a previous 
namespace collection. Set to NULL if a new 
RNamespaceCollection is to be used. 
aCharSetConverter is a reference to the character set 
conversion class. | 


void Reset( 

MDataSuppliefWriter& aWriter, 
MMarkupCallbackfic aCallback) 


Resets the generator ready to generate a new document j 
aWfiter is the data supplier writer used to generate a 
document, * 
aCallback is a reference to the call-back so the generator 
can report events. 


void BuildStartDocumenti^ 

RDocumentParameters 

aDocPatam); 


Builds the start of the docviment. 

aDocFaram specifies the various p^ameters of the 
1 docviment. | 


void BuildEndDocumentLO 


Bvdlds the end of the dooiment. 


void BuildStartElementL( 
RTagIiifo& aElement, 
BAttributeArray& aAttributes) 


Builds the start element with attributes and namespace if j 
specified. 

aEIement is a handle to the element's details. 
aAtttibutes contains the attributes for the element. 


void BviildEndElementL( 
RTagInfo& aEIement) 


Builds the end of the element. 

aEIement is a handle to the element's details. 


void BuildContentL( 


Builds part or all of the content. Large content should be 



Genetic data supply 

22 



const TDesC8& aContentPart) 


built in chunks. I.e. this function should be called many 
times for each chxink. 

aBytes is the raw content data. This data must be converted 
to the correct character set by the client 


void BuildProcessingInstructionL( 
RString& aTarget, 
RString& aData) 


Build a processing instruction. 

aTarget is the processing instruction target. 

aData is tiae processing instruction data. 



RAtttibute 



followm the API fo r this class; 



Method 


Description 


RTag[nfo& AttributeO 


Returns a handle to the attribute's name details. 


TAttributeType TypeQ 


Returns the attribute's type. Where TAttributeType is one 




of the following en\mieration: 




CDATA 




ID 




DDJREF 




IDREFS 




NMTOKEN 




NMTOKENS 




ENTITY 




ENTITIES 




NOTATION 


RString&ValueO 


Returns a handle to the attribute value. If the attribute value 




is a list of tokens (IDREFS, ENTITIES or NMTOKENS), 




the tokens "will be concatenated into a single RStdng with 




each token separated by a single space. 
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MDataSupplietReadet 

The following is the API for this mix-in class: 



Method Description 
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TUintS GetByteLO 


Get a single byte from the data supplier. 


const TDesC8& GetBytesL( 
Tint aNumberOfBytes) 


^ A^t.r-^Xr^^r^r nf <!i!7P aNiimbetOfChars. If the number 
^Trt-PQ i<; not available this method leaves ^vith KErrEof. 
The returned descriptor must not be deleted until another 
/-oil r;pfBvfe<iL or EndTransactionLQ is made. 


void StartTransactionLO 


The parser calls this to indicate the start of a transaction. 


void EndTrasactionLO 


The parser calls ihis to indicate the transaction has ended. 
Any data stored for the transaction may now be deleted. 


void RollbackLO 


The parse calls this to indicate the transaction must be rolled 
back to the exact state as when StartTransactionLO was 
called. , 



MDataSupplierWriter 



Method 


Description 1 


void PutByteL( 
TUintS aByte) 


Put a byte in the data supplier. 


void PutBytesL( 

const TDesC8& aBytes) 


Puts a descriptor in the data supplier. 


MMarkupCallback 

The following is the API for this mix-in class: 


1 Method 


Description 


void OnStartDocumentL( 
RDocumentParameters 
aDocParam, 
Tint aErrorCode); 


Callback to mcucate me siart ui u.ic 

aDocParam specifies the various parameters of the 
document. 

aErrorCode is the error code. If this is not EErrNone then 
special action may be required. 


void OnEndDocumentL( 
Tint aErrorCode); 


Indicates the end of the document has been reached 
aErrorCode is the error code. If this is not KEtrNone then 
special action may be reqtdred. 


void OnStartElementL( 
RTagInfo& aElement, 


Callback to indicate an element has been parsed. 
aElement is a handle to the element's details. 
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RAttributeArray& aAttributes, • 
Tint aErrorCode); 


aAttributes contains the attributes for the element. 
aErrorCode is the error code. If dais is not EErrNone then 
special action may be required. 


void OnEndElementL( 
RTagInfo& aElement, 
Tint aErrorCode); 


Callback to indicate the end of the element has been 
reached. 

aElement is a handle to the element's details. 

aErrorCode is the error code. If this is not KErrNone then 

special action may be required. 


void OnContentL( 
const TDesC8& aBytes, 
Tint aErrorCode) 


Sends the content of the element Not all the content may 
be returned in one go. The data may be sent in chunks. 
When an OnEndElementL is received diis means there is_ 




no more content to be sent, 

aBytes is die raw content data for the element. The client is 
responsible for converting the data to die required character 
set if necessary. In some instances with WBXML opaque 
data the content may be binary and must not be converted. 
aErrorCode is the error code. If this is not KErrNone dien 
special action may be required. 


void OnStartPrefixMappingL( 
RString& aPrefix, 
RString& aUri, 
Tint aErrorCode) 


Notification of the beginning of the scope of a prefix-URI 
Namespace mapping. This method is always called before 
the corresponding OnStartElementL method. 
aPrefix is the Namespace prefix being declared. 
aUri is the Namespace URI the prefix is mapped to. 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnEndPrefixMappingL( 
RString& aPrefix, 
Tint aErrorCode) 


Notification of the end of the scope of a prefix-URI 
mapping. This method is called after the corresponding 
DoEndElementL method. 

aPrefix is die Namespace prefix that was mapped. 
aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnIgnoreableWhiteSpaceL( 
const TDesC8& aBytes, 


Notification of ignorable whitespace in element content. 
aBytes are the ignored bytes firom the document being 
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Tint aErrorCode) 


parsed. 

aErrorCode is the error code. If this is not KErrNone then 
special action may be reqmred. 


void OnSkippedEntityL( 
RString& aName, 
Tint aErrorCode) 


Notification of a skipped entity. If the parser encounters an 

external entity it does not need to expand it - it can return 

the entity as aName for the client to deal with. 

aName is the name of the skipped entity, 

aErrorCode is the error code. If this is not KErrNone then 

special action may be required* 


void OnProcessingInstructionL( 
const TDesCSSc aTarget, 
const TDesC8& aData, 
Tint aErrorCode) 


Receive notification of a processing instruction. 

aTarget is the .processing instruction target. 

aData is die processing instruction data. If empty none was 

supplied. 

aErrorCode is the error code. If diis is not KErrNone tiien 
Special action may be required. 


void OnOutOrJJataJLQ 


There is no more data in the data supplier to parse. If there 
is more data to parse StartQ should be called once there is 
more data in the supplier to continue parsing. 


void OnError(TInt aError) 


An error has occ\irred where aError is the error code 



Sequence Diagrams 

Settuig up, parsing and generating 

Error! Reference source not found, shows the interaction of the dient with the various 
parker objects to create a parser and generator session. The parsing of a simple document 
with only one element and generation of one element is shown. It is assumed a DTD 
validator and auto correct component are used. Auto correction in this example is only 
used with the parser. The generator only checks that tags are DTD compUant but does 
not try to correct any DTD errors. 
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Element not valid at current level in DTD 

Auto correction is left up to the plug-in implementers to decide how and what should be 
corrected. 
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The sequence diagram in Figure 4 shows an example of what is possible with the case 
where the format of the document is valid, however, there is a invalid element (C) that 
should be at a different level as shown in an example document below: 
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<A>Content 
<B> 

<C> // Not valid for the DTD, should be a root element 

Some content 

</C> 

</B> 

</A> 



/ / <C> should go here 



The bad element is detected by the DTD validator and sent to the auto correct 
15 component. The auto corrector realises that this element has an error from the error 
code passed in the call-back and tries to find out where the element should go, and send 
back the appropriate OnEndElementLO call-backs to the client 

Scenarios 

20 Set-up a parser to parse WBXML without any plug-ins. 

Scenario to parse the following document: 
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<A> 

<B> 

Content 

</B> 

<A> 



30. 

1 . The client creates a data supplier that contains the data to be parsed. 
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2. The client creates an RParserSession passing in the data suppUer, MIME type for 
WBXML, the MIME type of the document to be parsed and the call-back pointer 
where parsing events are to be received. 

3. The client begins the parsing by calling StartQ on the parser session. 
5 4. The parser makes the following call-backs to the client: 

O nStartDocumentLQ 
OnStartElementLCA*) 
OnStartElementLCB") 
OnContentCContent") 
10 OnEndElementLCBO 
OnEndElcmentLCA") 
OnEndDocumentLQ 



Set-up a parser to parse WBXML with a vaUdator plug-in 

1 5 The same document as 5.1 is used in this scenario. 

1 . The client creates a data supplier diat contains the data to be parsed. 

2. The cUent constructs a RMarkupPlugins object ^th the UID of a validator. 

3. The client creates an RParserSession passing in the data suppUer, MIME type for 
20 WBXML, the MIME type of the document to be parsed, caU-back pointer where 

parsing events are to be received and die array of plug-ins object. 

4. The parser session first iterates through die array of plug-ins starting from die end of 
die Ust. It creates the CVaUdator ECOM object setting die caU back to die cUent. 
The CWbxmlParser ECOM object is created next and its caU-back is set to die 

25 CValidator object. This sets up die chain of .call-back events from die parser 

through to die vaUdator and tiien die dient The vaUdator needs access to data from 
the parser so SetParent needs to be caUed on all die plug-ins in die array. The 
validator sets its parent to the parser object 
5. The cUent begins die parsing by calUng StartQ on die parser session. 
30 6. The parser makes die foUowing caU-backs to die cUent: 
OnStartDocxunentLQ 
OnStartElementLCA') 
OnStartElemend:.CB') 
OnContent('Content') 
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OnEndElementLCBO 
OnEndElementLCA') 
OnEndDocumentLQ 

5 Generating a WBXML document with a DTD validator 
The document in 5.1 is to be generated in this scenario. 

1 . The client creates a data supplier with an empty buffer. 

2. The client constructs a RMarkupPlugins object with the UID of a validator. 

3. The client creates a RParserGenerator passing in the data supplier, MIME type for 

4. The generator session first iterates through the array of plug-ins starting from the end 
of the list. It creates the CValidator ECOM object setting the call back to the client 
The CWbxmlGenerator ECOM object is created next and its call-back is set to the 
CValidatot object. This sets up the chain of call-back events from the generator 
through to the validator and then the client. The validator needs access to data from 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object. 

5. The client then calls the following methods: 
BuildStartDocumentLO 
BuildStartElementLCAO 
BuildStartElementLCB') 
BuildContentL('Content') 
BuildEndEiementLCB') 

BuildEndElementL(*A') 
Design Considerations 

• ROM/RAM Memory Strategy - the string pool is used to minimise duplicate strings. 
Error condition handling - errors are returned back to plug-ins and the client via the 

30 call-back API. 

• Localisation issues — documents can use any character set and the character set is 
remrned back to the client in the case of parsing so it knows how to deal with the 
data. For a generator the client can set the character set of the document. 

• Performance considerations — the string pool makes string comparisons efficient. 
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• Platform Security - in normal usage the parser and generator do not need any 
capabiUties. However, if a plug-in were designed to load a DTD from the Internet it 
would require PhoneNetwork capabilities. 

• Modularity - all components in the framework are ECOM components that can be 
5 replaced or added to in the future. 

Testing 

The data supplier and parser generator set-up components can be tested individually - all 
the functions are synchronous and therefore no active objects need to be created for 
testing. 

The following steps can be carried out to test parsing and generation of WBXML or 
XML: 

1 . Load a pre-created file. 
15 2. Parse the file. 

3. Generate a buffer from the output of the parser. 

4. Compare the output of the buffer with the original pre-created file to see if they 
match. 

20 Additional tests are carried out to test error conditions of parsing, such as badly 
formatted documents and corrupt documents. 
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Open Issues 

The following issues need to be resolved before this document is completed: 



5 1. If a plug-in requires capabilities to connect to the Internet what capabilities does the 
framework need? 

2. The API for CMarkupCharSet Converter and RDocumentParameters needs to 

be decided. 
Glossary 

10 The following technical terms and abbreviations are used within this document. 



Term 


Definition 


XML 


Extensible Markup Language 


WBXML 


WAP Binary Extensible Markup Language 


SAX 


Simple API for XML 


DOM 


Document Object Model 


Element 


This is a tag enclosed by angje brackets. E.g <Name>, <Address>, <Phone> etc 


Attributes 


These are the attributes associated with an element. E.g. <Phone Type="Mobile"> The 
attribute here is *Type". 


Values 


These are the actual value of an attribute. E.g. <Phonc Type="Mobiie"> The value here is 
"Mobile" 


Content 


This is the actual content for an element. E.g. <Name>Symbian</Name>. Here 
"Symbian" is the content for the element "Name", 


DTD 


Document Type Definition 


MIME 


Multipurpose Internet Mall Extensions 


Code Page 


Since only 32 elements can be defined In WBXML, code pages are created so that each 
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Term 


Definition 




code page can have 32 elements. 


A.OL« J 


Extensible Style-sheet Language Transformations 


C/"^ AO 


Simple Object Access Protocol 




Uniform Resource Identifiers 


qualified name 


A qualified name specifies a prefix : local name c:g. *HTML:B' 


prefix 


From the qualified name example this is 'HTML* 


local name 


From the qualified name example this is *B* 
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Appendix A - <Auto correction examples> 

Table Al shows a situation where the end tags are the wrong way roiind for A and B. 
This is very easy to fix since the DTD validator keeps a stack of the tags, it knows what 
the end tag should be. 



<A>Content 
<B> 

More content 
</A> 



</B> 



Table A 1: End tags that are the mong ivcpf round 



15 Table A2 shows the situation where the B end tag is missing. Since the end tag does not 
match a guess can be made tiiat there should be an end tag for B before the end tag of A 



<A>Content 
<B> 

More content 

</A> 



Table A2: Missing end tag 
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Table A3 shows the situation where there are no end tags for A and B, The DTD 
validator will detect the problem and send an end tag for B to the client. The auto correct 
component will query the DTD validator if the C tag is valid for the parent element A. If 
it is valid a OnStartELementLO will be sent to the client, otherwise die auto correct 
component can check further up the element stack to find where this element is valid. If 
it is not valid anywhere in the stack then it will be ignored together with any content and 
end element tag. 
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<A>Content 
<B> 

More content 
<C> 

Some content 
</C> 



Table A3: Missing end tags 
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Appendix B - How to write a namespace plug-in 

The tables below show the WBXML tokens for the example namespace. Tables 1 to 3 
each represent a static string table. Tables 1 shows the elements for code page 0. Tables 2 
and 3 are for attribute value pairs respectively. Each attribute index on Table 2 refers to 
5 the values of the same index in Table 3. These token values must match up in Tables 2 
and 3. If an attribute does not have a value then there must be a blank as shown in Table 
3 with token 8. For attribute values, these also appear in Table 3 but have a WBXML 
token value of 128 or greater. 



Element type name 


WBXML 




token 


Addr 


5 


AddType 


6 


Auth 


7 


AuthLevel 


8 


Table 1: ElemenfTabkO, code page 0 


Attribute name/value pair 


WBXML 


(attribute part) 


token 


TYPE 


6 


TYPE 


7 


NAME 


8 


NAME 


9 . 


« 

Table 2: AttributeValuePairNameTabley code page 0 


Attribute name/ value pair 


WBXML 


(value part) 


token 


ADDRESS 


6 


URL 


7 




8 


BEARER 


9 


GSM/CSD 


128 
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GSM/SMS 


129 


GSM/USSD 


130 



5 



The foUowing string table files (.st) are created for each table 



10 



# Element table for code page 0 
stringtable ElementCodePageO 
EAddr Addr 
EAddType AddType 
EAuth Auth 
EAuthLevel AuthLevel 



String table for Table 1 



15 



# Attributes table for code page 0 
stringtable AttributesCodePageO 
EType Type 
EType Type 
EName Name 
EName Name 



25 



20 String table for Table 2 



# Attribute values table for code page 0 
stringtable AttributeValuesCodePageO 
EAddress Address 
EUR1.URL 
EBearer BEARER 
EGSM_.CSD GSM/CSD 
EGSM_SMS GSM/SMS 
EGSM^USSD GSM/USSD 



30 String Table for Table 3 
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<Example usage of API > 

Below shows an example of how to setting up the parser and generator with DTD 
5 checking and auto correction, 

RMarkupPlugins plugins; 
plugins.Append(KMy Validator); 
plugins.Append(ICMyAutoCorrector}; 
10 CDescriptorDataSupplier* dataSupplier = CDescriptorDataSupp]ier::NewLCO; 
KPaf s efSessio n~piFs^Tr 

parser.OpenL(dataSupplier, MarkupMimeType, DocumentMimeType, callback, plugins); 
parser.ParseQ; 

/ / Callback events will be received 
1 5 parser.CloseO; 

//Now construct a generator using the same plug-ins and data supplier 
RGeneratorSession generator; 

generator, OpenL(dataSupplier, MarkupMimeType, DocumentMimeType, callback, 
20 plugins); 

generator. Builds tartDocumentLO; 

RAttributeArray attributes; 

// Get an RString from the ElementStringTable 

RString string=generator.StringPoolO.StEing(ElcmentStringTable::Tagl, ElementStringTable); 
25 // Build one element with content 

generator.BuildStartElementL(string, attributes); 

generator.BuiIdContcntLC.L8("This is the content")); 

generator.BuildEndElemeniL(string); 

generator.BuildEndDocumentLO; 
30 generator.CloseO; 
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Appendix 2 

How the String Pool is used to patse both text and binary mark-up language 

5 The Mark-up Language framework design reUes on the fact that it is possible (using the 
'String Pool' techniques described below) to provide the same interface to cUents no 
matter if text or binary mark-up language is used. 

Text based mark up languages use strings, i.e. sequences of characters or binary data. In 
10 the String Pool technique, static tables of these strings are created at compile time, with 
one string table per namespace, for all the elements, attributes and attribute values 
needed to describe a particular type of mark-up document. Each element, attribute and 
attribute value is assigned an integer number and these integer "handles' form an index of 
the strings. A string in an XML document can be rapidly compared to all strings in the 
15 string table by the efficient process of comparing the integer representation of the string 
with aU of the integer handles in the static string table. The main benefit of using a string 
pool for parsing is therefore that it makes it very easy and efficient for the dient to check 
for what is being parsed, since handles to strings are used instead of actual strings. This 
means only integers are compared rather than many characters, as would be the normal 
20 case if string pools were not used. Also, comparisons can be carried out in a simple 
switch statement in the code, making the code efficient, and easier to read and maintain. 
Hence, the string pool is used to make string comparisons efficient at the expense of 
creation of the strings. 

25 For binary mark-up language (e.g. WBXML) the situation is more complex since there 
are no strings in WBXML. In WBXML, everything is tokenised (i.e. given a token 
number). We get around the absence of strings as follows: a table of mappings of each 
of the WBXML tokens to tine index of die string in the string table is created (see Figure 
8). Each mapping is given a unique integer value - a handle. Since it is requited to map 

30 from tokens to strings and vice versa, two lists of integer value handles are created: one 
indexed on tokens and the odier indexed on the index of the position in the string table; 
This is so that it is quick to map from one type to the other. All this is encapsulated in 
the namespace plug-in and therefore is insulated firom the cUent, parser and generator. 
The cUent can therefore parse a binary or text document without having to know about 
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the specific format — it simply uses the integer handle (RString), which will work 
correctly for both text and binary mark-up languages. 
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CLAIMS 

1. A portable computing device programmed with a mark-up language parser or 
5 generator that can access data from a source using a generic data suppUer API. 

2. The device of Claim 1 in which the source is a buffer in memory. 

3. The device of Claim 1 in which die source is a file. 
The device of Claim 1 in which the source is a socket outputting streaming data. 



10 

4. 



5. The device of any preceding claim which uses a data source different from that 
data sources which the device was capable of using when first operated by an 
15 end-user. 
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ABSTRACT 

MARK-UP LANGUAGE FRAMEWORK WITH GENERIC HANDLING OF 
DATA SOURCES 

5 

A portable computing device programmed with a mark-up language parser or generator 
that can access data from a source using a generic data supplier API. Hence, the patser 
or generator is insulated from having to talk directly to a data source; instead, it does so 
via a generic data supplier API, acting as an intermediary layer. This de-couples the 
10 parser or generator from the data source and hence means that the parser or generator 

nG~longer-have-to-be-hard^coded""ft5t " a Tpedfic"data'Tuppli"e^^^ THTs'In" turn Ieacrs*t:6~1a 

simplification of the parser and generator design. The present invention allows parsing 
and generation to be carried out with any data source. For example, a buffer in memory 
could be used, as could a file, as could streaming from a socket (hence enabling the 
15 ability to parse in real-time from data streamed over the internet). 
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Fig 1: Block diagram of mark-up framework with four plug-ins- 
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Fig 2: Block diagram of a cHent parsing using a DTD vaUdator and auto 
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Fig 3: Block diagram of a cHent using a generator with a DTD validator and auto 



5/6 



OpenL( > 



Parsing bSQlns 



^StartOoeumenlLl ) 



; OnStanElenwntLf ) 



OnContsntLO 



I OnEndEtomantU ) 



I OnEndDocumenlM } 



BiMStarteien»entL( ) 



BundConte«itl.( ) 



BuiMEndEleniantLi ) 




Fig 6: Sequence diagram for parser and generator 
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Fig 7: Sequence diagram showing DTD vaUdation and auto correction 
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Fig 8: WBXML token of elements mapping to string table of elements in the namespace plu g-in. 
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